[SPARK-37878][SQL] Migrate SHOW CREATE TABLE to use v2 command by default#35204
[SPARK-37878][SQL] Migrate SHOW CREATE TABLE to use v2 command by default#35204Peng-Lei wants to merge 1 commit intoapache:masterfrom
Conversation
|
@cloud-fan @imback82 Could you take a look? Thank you very much. |
There was a problem hiding this comment.
when v2 command work with v1 table. when we resolve the table. we load the table by V2SessionCatalog, then create V1Table(sessionCatalog.getTableMetadata), here when getTableMetadata, we will replace char/varchar with string in schema. So V1Table(t) t is alway with string type although type is char/varchar. So re-construct the schema of V1Table.
There was a problem hiding this comment.
I think we should restore the raw table schema in SHOW CREATE TABLE, not here.
There was a problem hiding this comment.
other v2 command works v1 table. maybe encounter this problem. but I will update as you said. thank you very much.
There was a problem hiding this comment.
is it simply identifier.asMultiPartIdentifier.quoted?
There was a problem hiding this comment.
TableIdentifier could not asMultiPartIdentifier, Identifier can asMultiPartIdentifier
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
We don't need to special-case v1 table. We should recognize these special partition transforms (bucket and sorted_bucket) and generate the CLUSTERED BY syntax for them.
@huaxingao have we unified this part? We discussed this before and the parser should generate partition transforms for bucket spec already, instead of BucketSpec
There was a problem hiding this comment.
Haven't changed this yet. Remember that
c.partitioning ++ c.tableSpec.bucketSpec.map(_.asTransform) didn't work inside AstBuilder because sortColumnNames needed to be empty. Now it should work. I can have a follow up to fix this.
643673e to
3b8bde2
Compare
55fd5e9 to
7b52ca9
Compare
There was a problem hiding this comment.
nit: we can turn this if into an assert, as this can never happen
There was a problem hiding this comment.
isn't it convered by TABLE_RESERVED_PROPERTIES?
There was a problem hiding this comment.
As PROP_EXTERNA is not include by TABLE_RESERVED_PROPERTIES. do you think it should be included by TABLE_RESERVED_PROPERTIES ?
There was a problem hiding this comment.
yea it should, it's indeed a reserved property
|
|
||
| private def showTableDataColumns(table: Table, builder: StringBuilder): Unit = { | ||
| val columns = table.schema().fields.map(_.toDDL) | ||
| val columns = CharVarcharUtils.getRawSchema(table.schema(), SQLConf.get).fields.map(_.toDDL) |
There was a problem hiding this comment.
| val columns = CharVarcharUtils.getRawSchema(table.schema(), SQLConf.get).fields.map(_.toDDL) | |
| val columns = CharVarcharUtils.getRawSchema(table.schema(), conf).fields.map(_.toDDL) |
We are within a physical plan, we can access conf here.
| if (tableOptions.nonEmpty) { | ||
| val props = tableOptions.toSeq.sortBy(_._1).map { case (key, value) => | ||
| s"'${escapeSingleQuotedString(key)}' = '${escapeSingleQuotedString(value)}'" | ||
| val props = SQLConf.get.redactOptions(tableOptions).toSeq.sortBy(_._1).map { |
| -- !query output | ||
| CREATE TABLE `default`.`tbl` ( | ||
| CREATE TABLE default.tbl ( | ||
| `a` FLOAT, |
There was a problem hiding this comment.
to be consistent, shall we also only quote the column name if needed? This is can be done in a new PR, as it's not related to v2 command migration.
|
@cloud-fan Update the PR. As the testcase failed #testcase. Maybe "Add the external to TABLE_RESERVED_PROPERTIES" in separate PR is more suitable? |
|
do you know why the test failed? |
without |
|
@Peng-Lei can you be more specific? What's the expectation of the tests and how do we break it? I see that we exclude |
expectation of the tests:
At first I was really confused why filter |
|
@Peng-Lei We should update |
|
thanks, merging to master! |
…me if needed ### What changes were proposed in this pull request? Quote the column name just needed instead of anyway. ### Why are the changes needed? [#comments](#35204 (comment)) ### Does this PR introduce _any_ user-facing change? Yes,It will change the result that users get the schema eg: ``` "STRUCT<`_c0`: STRING, `_c1`: INT>" => "STRUCT<_c0: STRING, _c1: INT>" ``` At now. for end-user. I learn about the 3 way to get schema directly 1. the function: eg ``` schema_of_json schema_of_csv ``` 2. table schema df.schema or show create table 3. call `toDDL` for StructType or StructField. ### How was this patch tested? existed testcase. Closes #35227 from Peng-Lei/Quote-Column. Authored-by: PengLei <peng.8lei@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? Take `external` as a reserved table property. and do not allow use `external` for end-user when `spark.sql.legacy.notReserveProperties` == `false`. ### Why are the changes needed? [#disscuss](#35204 (comment)). keep it consistent with other properties like `location` `owner` `provider` and so on. ### Does this PR introduce _any_ user-facing change? Yes. end-user could not use `external` as property key when create table with tblproperties and alter table set tblproperties. ### How was this patch tested? existed testcase. Closes #35268 from Peng-Lei/SPARK-37950. Authored-by: PengLei <peng.8lei@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…n" property ### What changes were proposed in this pull request? This is a followup of #35204 . #35204 introduced a potential regression: it removes the "location" table property from `V1Table` if the table is not external. The intention was to avoid putting the LOCATION clause for managed tables in `ShowCreateTableExec`. However, if we use the v2 DESCRIBE TABLE command by default in the future, this will bring a behavior change and v2 DESCRIBE TABLE command won't print the table location for managed tables. This PR fixes this regression by using a different idea to fix the SHOW CREATE TABLE issue: 1. introduce a new reserved table property `is_managed_location`, to indicate that the location is managed by the catalog, not user given. 2. `ShowCreateTableExec` only generates the LOCATION clause if the "location" property is present and is not managed. ### Why are the changes needed? avoid a potential regression ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests. We can add a test when we use v2 DESCRIBE TABLE command by default. Closes #36498 from cloud-fan/regression. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…n" property ### What changes were proposed in this pull request? This is a followup of #35204 . #35204 introduced a potential regression: it removes the "location" table property from `V1Table` if the table is not external. The intention was to avoid putting the LOCATION clause for managed tables in `ShowCreateTableExec`. However, if we use the v2 DESCRIBE TABLE command by default in the future, this will bring a behavior change and v2 DESCRIBE TABLE command won't print the table location for managed tables. This PR fixes this regression by using a different idea to fix the SHOW CREATE TABLE issue: 1. introduce a new reserved table property `is_managed_location`, to indicate that the location is managed by the catalog, not user given. 2. `ShowCreateTableExec` only generates the LOCATION clause if the "location" property is present and is not managed. ### Why are the changes needed? avoid a potential regression ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests. We can add a test when we use v2 DESCRIBE TABLE command by default. Closes #36498 from cloud-fan/regression. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit fa2bda5) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
quoted(identifier: TableIdentifier)to quoted the table name of V1 command(SHOW CREATE TABLE[AS SERDE]) to match V2 behavior. It just work whenquoteIfNeededaddV2TablePropertiesofV1Table. Just whenexternal == true, we will addlocationproperty.V1Table.Schema, re-construct the original schema from the string.SHOW CRATE TABLEShowTablePartitionsto match V1 behavior.Why are the changes needed?
It's been a while since we introduced the v2 commands, and it seems reasonable to use v2 commands by default even for the session catalog, with a legacy config to fall back to the v1 commands.
Does this PR introduce any user-facing change?
use V2 command as default for
show create tableif LEGACY_USE_V1_COMMAND == true
will use V1 command
How was this patch tested?
build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *ShowCreateTableSuite"